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1. Abstract 

For any distribution vr with support equal to [n] = {1,2, ... ,n}, we study the 
set An of tridiagonal stochastic matrices K satisfying '7i{i)K[i,j] = TT{j)K[j,i] for 
all i,j G [n]. These matrices correspond to birth and death chains with stationary 
distribution tt. We study matrices K drawn uniformly from At^, following the work of 
Diaconis and Wood on the case 7r(i) = K We analyze a 'block sampler' version of their 
algorithm for drawing from at random, and use results from this analysis to draw 
conclusions about typical matrices. The main result is a soft argument comparing 
cutoff for sequences of random birth and death chains to cutoff for a special family 
of birth and death chains with the same stationary distributions. 

2. Introduction 

In [9], Diaconis and Wood study the collection A/i 1 1-, of n by n doubly- 
stochastic tridiagonal matrices. These matrices are the transition kernels of birth 
and death (BD) chains with uniform stationary distribution, and [o] uses detailed 
knowledge of the set A/i i i-, to obtain information about 'typical' birth and death 
chain with uniform stationary distribution. In this paper, we extend many of their 
results to birth and death chains with general stationary distribution. 

There are two main contributions. Section [3] contains an analysis of a 'block sam- 
pler' version of their algorithm for sampling from the set of transition kernels with a 
given stationary distribution, and proves that the algorithm is quite efficient. Sections 
|4] to [6] contain a description of the cutoff phenomenon for random sequences of birth 
and death chains with given stationary distributions. More precisely. Section |4] intro- 
duces the results that will be used to describe cutoff, and section [5] proves that some 
natural sequences of random birth and death chains do exhibit cutoff. Our primary 
result, the statement of which is slightly technical, is in Section [6j This generalizes 
the main result of [9] proving the lack of cutoff for random birth and death chains 
with uniform stationary distribution to many other families of birth and death chains 
using a probabilistic comparison technique. 
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We begin with some notation. Let vr be a distribution on [n] = {1,2, ... ,n}, 
and consider the set At^ of tridiagonal stochastic matrices K satisfying 7!-{i)K[i,j] = 
7c{j)K[j,i] for all i,j G [n]. This set may be viewed as a compact convex subset 
of ]R"~^, with the matrices parameterized by their super-diagonal entries K[i, i + 1]. 
With the convention cq = = 0, we associate to a sequence q satisfying 

/ N / 7r(i — 1) 7i(i + 1) , 

1 0<Q<mm 1- \ ' a.i, 

a matrix K G An given by: 

(2) K[i,t + l]=Ci 

(3) K[i + l,t] 



TT{i + 1) 



7i(i — 1) 

(4) K[t,t] = l-c,-^—^c,-i 

(5) K[t,j] = 0, \i-j\>l 

It is easy to see that there is a similar parameterization by the sub-diagonal entries 
K[i + We will sometimes need to use this other representation. 

We now recall some standard definitions from the theory of Markov chains. Let 
fi, V be two distributions on the measure space {VL,A). The total variation distance 
between [i and v is given by 

ll/i — v\\tv = sup |/i(v4) — z/(A)| 

Then the mixing profile of an ergodic Markov chain Xt with stationary distribution 
TT is given by 

r(e) = sup inf{t : — 7r||ry < e} 

Xoen 

for any < e < 1. 

It is well-known that d{t) = supxo,Yoen\\'^i-^t) — 'C(^)||tv is submultiplicative 
and satisfies d(t) > ||£(Xj) — Trllry. However, for many well-known sequences of 
Markov chains, the distance to stationarity drops from very close to 1 to very close 
to over a distance that is much smaller than the 0(r(|)) timescale suggested by 
submultiplicativity. This is known as the cutoff phenomenon. More precisely, a 
sequence X^""^ of Markov chains with mixing profile T„(e) exhibits cutoff if 

rje) 

hm 



n^oo Tn{i- — €) 

for all < e < L 

The cutoff phenomenon gained its current name in [l]. Over the following decade, 
there was a great deal of effort to prove that certain natural families of Markov chains 
exhibited cutoff (see [6] for a survey of early results). Almost all of these families 
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consisted of chains with a great deal of symmetry. Recently, there has been a great 
deal of success in analyzing more complicated examples. Some of this progress, such as 



Sly and Lubetzky's amazing paper 19 , consists of detailed analyses of specific chains. 
More importantly for this paper, general (and often easily checkable) necessary or 
sufficient conditions for cutoff have been found |8) |5) (To) 



This paper will rely on a characterization of Total Variation cutoff for birth and 



death chains found in 10 . The necessary condition for cutoff, which applies for all 
sequences of Markov chains, is as follows. Let Kn be a sequence of kernels of Markov 
chains satisfying Kn[iii] > \ for all 1 < i < n; these are called |-lazy chains. Then 
Kn is a symmetric matrix with all real and nonnegative eigenvalues, the largest of 
which is 1. Let A„ be the second largest eigenvalue of K^. Then let r„ = 7:„(|) be its 
mixing time, and call its relaxation time. Then if lim„_>oo ''^^(l — ^n) < oo, the 
sequence of Markov chains doesn't exhibit cutoff (see proposition 18.4 of (l8)). The 
much more difficult partial converse. Theorem 1 of llOJ, is: 



Theorem 1 (Cutoff for BD Chains). For any fixed < e < |, there exists a constant 
C(e) so that for any ^-lazy BD chain Xt with spectral gap 7, 



T(e)-r(l 



7 



The work on cutoff in this paper was inspired by the question of whether or not 
cutoff is 'generic'. That is, do typical sequences of Markov chains exhibit cutoff? 
There has been a great deal of work on this question for random walks on groups. 



going back to work of Dou, Hildebrand and Wilson 11 28 and described in the 



survey paper 14 . More recently, this has progressed to other random walks with 



uniform stationary distribution 20 [9] . To our knowledge, this paper contains the first 
results describing cutoff for families of random chains with non-uniform stationary 
distribution. Looking at non-uniform distributions poses new challenges, and our 
techniques rely on comparing these non-uniform chains to related uniform chains. 

Throughout this paper, the kernel K is always chosen uniformly from the set At^, 
but cutoff is always discussed for the associated |-lazy kernel, K' = ^{I + K). This 
allows Theorem [T] to be applied. It is worth mentioning, in light of Theorem 3.1 



of 



10 



that there is nothing special about making the chain exactly —lazy. In partic- 
br any fixed < 5 < 1, all theorems about the existence of cutoff for sequences 



ular, 

of random chains K' = ^(/ + K) will also apply to sequences of random chains 
K" = SI + {1 — 6)K, with modifications only to the cutoff location and bounds on 
the window size. Similar arguments can be made to apply for the original chain K 
in many cases, but this requires substantial extra calculation. 
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3. A GiBBS Sampler on 

In Section 2.2 of [o], Diaconis and Wood propose several ways to sample from the 
set An- They include an exact algorithm, which seemed to be slow in practice, and 
a Markov chain based algorithm without any rigorous running time bounds, which 
seemed to be quick in practice. This section describes an algorithm closely related to 
that used in [o] , and provides a proof of its efficiency. Although the algorithm is quite 
efficient, it is not trivial to implement well. Its analysis is included here to answer 
a question posed in [o], and also because many of the lemmas proved in its analysis 
are useful for proving the main theorems about cutoff in this paper. For a practical 
discussion about implementing this sampler, a very different analysis resulting in a 
novel perfect sampling algorithm, and an 0(?t, log(n)^) bound on the running time of 



the original algorithm, see 27 
Define the space 

Bn= Ix e M"-^ : < X[t] <mm(l- ''^V~^^ X[z - 1], ""^V^^V " X[i + 1]) 

with the convention X[0] = A[n] = 0. From equation ([T]), this is a parameterization 
of An- Furthermore sampling uniformly from B.^ is equivalent to sampling uniformly 
from An- Next, fix an integer A; > and weight w > 0- 

We will define the following 'block' Gibbs sampler Xf on B^, for all n > k- At each 
step t, choose a coordinate 1 < i(t) < n — k, according to the measure P[i{t) = 1] = 
P[i(t) = n-k] = „_;,r2+2^ ) and P[i{t) = j] = for j ^ l,n-k. Next, update 

the entries of X( as follows. Forj ^ + . . . + — 1}, set = Xt[j]- 

For the other entries, choose Ut uniformly in B^ conditioned on Ut[j] = Xt[j] for all 
j i i{t) + 1, . . . , i{t) + - 1}. Then, for all j G i{t) + 1, . . . , %{t) + - 1}, 

set At+i[j] = Vt\2\- The Gibbs sampler originally proposed in [o] corresponded to the 
choice w = k = \. 

Let CiX^ denote the distribution of the random variable X, and let f/^ denote the 
uniform distribution on B^- The main result of this section is: 

Theorem 2 (Convergence of Block Dynamics). For c > 0, k = w = 55, t > 

2cn\og{n), and n > 58, 

\\C{Xt)~Un\\TV<n'-'' 
Conversely, for any choices of k and w, any choice of c > 0, and t = — c)n\og{n), 

liminf ||£(Xt) - UnWrv > 

n—>oo 

The proof proceeds by a path-coupling argument. There are many variants of this 



argument, which was introduced in |4|. In this proof. Theorem 19 of 22 will be used. 
For a fixed transition kernel i^' on a metric space {X, d) of finite diameter diam{X), 
the theorem may be stated as follows: 
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Theorem 3 (Path Couphng). Assume the metric space has the property that 

for any x,y & X , there exists a sequence of points x = Xq,Xi, . . . ,Xk = y with the 
properties d{xi,Xi+i) < 1 and d{x,y) = Y1'^=q d{xi, Xi+i) . Assume that for all pairs of 
points x,y with d{x,y) < 1, it is possible to couple a pair of Markov chains {Xt,Yt) 
started at {Xq,Yq) = {x,y), both with transition kernel K, so that E[d{Xi^Yi)\ < 
ad{x,y) for some a < 1. Then, if Xt and Yt are two Markov chains with transition 
kernel K and any initial distribution on X , they may be coupled so that for all t >0, 

E[d{Xt,Yt)] < a* sup d{p,q) 

To use the theorem, it is necessary to first define a metric and then prove a one-step 
contraction estimate for nearby points. Define the following Hamming-like metric 
on B-„: d{X,Y) = lx[j]j^Y[j]- The goal is to find a 1-step coupling for the 

block dynamics so that if (i(Xo,lo) = 1, E[d{Xi,Yi)] < a < 1 under the coupling. 
Throughout this proof, all couplings will proceed by choosing the same block at each 
step for both chains. The simple strategy is informed by the heuristic from the theory 
of spin systems that strong spatial mixing (of the underlying object being sampled) 



implies rapid temporal mixing (of a Gibbs sampler on that object); see 12 for one 
explanation of the heuristic. In this case, the underlying object is the vector of 
super-diagonal entries with geometry being that of the path, and of course the Gibbs 
sampler is the block dynamics. 

To turn the weak convergence bound of Theorem [3] into a bound on Total Variation 
distance, we use the following following standard lemma (see Theorem 5.2 of [l8]): 



Lemma 4 (Fundamental Coupling Lemma). Assume {Xt, Yt) is a coupling of Markov 
chains such that if Xg = Yg, then Xt = Yt for allt > s. Assume also that Xq = x and 
Yq is distributed according to the stationary distribution of K. Define the random time 
T to be the first time at which Xt = Yt. Then sup^gj^ \K^{x,A) — vr(y4)| < P[t > t] 

Assume for now that Xq and Yq differ only at a single coordinate j satisfying 
n — k>j>k + l. Smaller and larger values of j will be evaluated later. The goal 
now is to find a coupling which minimizes the expected distance after a single block 
has been updated, j is inside the block with probability ^_^_^2+2t» • ^^^^ case, the 
two blocks are updated with exactly the same distribution, and so under the obvious 
coupling the distance decreases by 1. With probability the block ends at 

j — 1 or begins at j -|- 1. In this case, the distance generally increases, potentially by as 
much as k. The goal is to find a coupling under which the distance is unlikely to in- 
crease by a large number. The main step is the following lemma, which describes the 
spatial mixing of coordinates for elements chosen from the uniform distribution on Bt^'- 



Lemma 5 (Super-Diagonal Mixing). Fix l<i<j<n — 1 and < b,c < 1. Let Z 

and Q be chosen from chosen uniformly conditioned on Z[i] = bi, Q[i] = 62 o.'nd 



6 



AARON SMITH 



m = Q[j] 



a. Then, for 2i < j — 2, 



\\C{Z[i + 2i]) - C{Q[t + 2£])\\tv < Hil - R{i + 2q)) 



q=l 



where R{q) 



1 - [C{q) - 1] log (irV) and C{q) = 16min l) . 



The proof of this lemma will occupy most of the remainder of this section. The 
first step is the following easy coupling lemma. Let F be a differentiable cumulative 
distribution function on [0, 1] which satisfies 

• F{x) > X 

• F{x) < Cx for some C > 1 

• F{x) is concave 

Then let be the cumulative distribution function on [0,C^^] with density ^{x) = 
^~^T^' where Z = 1 — [C — 1] log (j^^pr) is a normalizing constant. Then 

Lemma 6 (Minorization for Concave Lipschitz Distributions). For F and \E' described 
above, and [a,b] C [0,C~^], 



where the first inequality is due to assumption in), and the second is due to assump- 
tion ii). Thus, 



proving the lemma. □ 

In order to use this, it is necessary to relate this coupling of a single entry in the 
updated block to some sort of coupling of the entire updated block. The following 
generalization of Theorem 4.1 of \^ provides a strong way to do so: 

Theorem 7 (Markov Property for Super-Diagonal Entries). Let B be chosen uni- 
formly from Bt^. Then for any 1 < i < n — 2, any real constants ai, . . . , in the 
interval [0, 1] and any t G [0, 1], 



F{b) 



F{a) > Z(^(6) - ^(a)) 



To prove this, note that 




< aF'(O) + (1 - a)F'{a) 

< aC +{l-a)F'{a) 



(7) 



P[B[i + 1] < t\B[l] =ai,...,B\i] = ai] = P[B[i + 1] < t\B\i] = a^] 
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The proof is the same as in the reference, and both the idea and notation are 
copied. Define Xj = B[i], and let 



min(x 5li±?ll /■minl'l ^li+il-i 



■ /-I 7r(n — 3) Trfn — 1)\ 

^ 7r{n — 2) '^'7r(n — 2)^ 



n~2 



The left-hand side of equation 
define 



/,+ 1 ,n (min(f , 1 - ^jl^^^ ai ) ) 



can then be written as 



Next, 



Xi=0 



t(') 



vr(i + 2) N 



■ /-I 7r(n — 2) Trfn) \ 

^ 7r{n — 1) ^'7r(n — 1)^ 



The right-hand side of equation 



can then be written as 



gi+l,n(min(t,l- ^ffi) a»)) 

/-I 7r(i + l) \ 



Next 



note that for any fixed < x < 1, gi+i,nix) = Thus, 

^i+i,„(min(t, 1 - ^^r^ai)) 



P[K[i + l,i + 2] < t\K[i,i + l] 



7r(i+l) 



a,; 



fi'i+l,n(l ^(j) 

/i,i-i(l)/i+i,n(min(t, 1 - ^^fli)) 



/l,i-l(l)/j+l,n(l 



7r(j+l) - 
7r(i) ' 



= P[K[i + 1, z + 2] < t\K[l, 2] = ai, . . . , K[i, i + 1] = a,] 

This proves the lemma. □ 

As an aside, several closely related lemmas are possible. For example, fix a distri- 
bution TT on [n], and a rooted tree T = {[n\,E). One could sample from all kernels 
K which satisfy K[x, = if (x, y) ^ E, and which give rise to Markov chains with 
stationary distribution vr. Then the same factorization argument says that for any 
vertex v E T with unique parent p, child Ci, and with edges (pi, 1,^1,2)5 • • • , {Pm,i,Pm,2) 
such that any path from i to v passes through pj 2 and then p, 



P[K[v,ci] < x\K[p,v] = ao,K[pi^i,pi^2 



P[K[v,ci] <x\K[p,v] = ao] 



Since less is known about general chains on trees, it is harder to apply this result to 
the study of cutoff. 

In order to use Lemma [6} the following two bounds on the entries of random 
elements of i3,r are necessary. First, we bound the distribution from above in terms 
of uniform random variables on [0, 1]: 

Lemma 8 (Largeness). Let B he chosen uniformly from B.,^. For any interval (a, h) C 
[0,1], 



P 



B\i] e a, 



a + b 



> P 



B\^]ei'^.b 
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This remains true when conditioned on the particular values of any collection of other 
entries, in the following sense. Let J^ij,u,v be the event that B[i] = u and B[j] 
Then: 



V. 



P 



B[i] e a, 



a + b 



> P 



Bit] G 



a + b 



i—ki,i+k2,u,v 



To prove this, for X E with X[i] = ^ + e and < e < define X by 
= ^ - e and X[j] = X[j] for j ^ i. Then X e B^. The map X ^ X is clearly 
measure-preserving, and this proves the inequality with no conditioning. The proof 
when conditioned on the particular values of other entries is identical - the 'hat' map 
X — )■ X is a measure-preserving map from {B G B.,^ : B[i] G (a, ^^),B[i — ki] = 
Vi,B[i + k2] = V2} to {BeB^ : B[t] G (^, 6), B[i - k^] = v^, B[i + k^] = v^} . U 

Corollary 9. For X chosen uniformly in B^^, P[X[i] > xmin(l, ^^^^^)] < ~ ^) 
for all X G [0, 1]. Again, this remains true when conditioned on the particular values 
of any collection of other entries. 

By Theorem [7| the conditioning result in Lemma [8] is quite general. 
Next, we bound the distribution from below in terms of uniform random variables 
on [0,1]: 

Lemma 10 (Smallness). Let X he chosen uniformly from Bj^, and let J^i,j,u,v be the 
event that X[i] = u and X[j] = v. Then for any D > 3, and fixed a,b E [0, 1], 



P 



X[i + 2] < 



1 

D 



16 . 

< — mm 
- D 



T\{i + 3) 
7r(i + 2) 



Furthermore, if Ti^i + 2) < 7r(i + 3), then 

1 



X[i + 2] < 



D 



i,i+i,a,b 



< 



D 



Let a = I), fix some r > to be determined later, and choose Xi and X2 satisfying 
< Xi < a and ra < X2 < {r + l)a. Then define 



L(x) = min I 1 



a, — — — 1 



7r(i + l) '7r(i + l) 



and Ri = 
Case 1: 



^^7^ < 1. There are three cases to consider to bound Ri. 

L{xi) — J- 



Tld) ^ 7l{i + 2). 

-a < — : 1 



n{i + 1) ~ TT{i + 1) 



7r(z + 2), 
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In this case, Ri — 1. 
Case 2: 



In this case, 



7r(i + l)^ ^-7r(i + l)^ ^- 7r(i + 1) 

I-X2 



Ri 



1 — Xi 

> 1 - (r + 1)0; 



Case 3: 



In which case 



7r(i + 2), , , 7r(i) irii + 2) , 

rCi — 7TT 

vr(j+l)" 

> ^ ~ ^2 
~ 1 — Xi 

> 1 - (r + l)a 
Thus, in all three cases, i?i > 1 — (r + l)a. Next, define 

U(x) — minfl -0, — r 1 — x)) 

^ ' ^ 7r(i + 3) '7r(i + 3)^ 



Lie -IL2 — 

Case 1: 



and define Ro = rr/ l < 1- Again, there are three cases. 



7r(i + 3) ^ ^ - 7r(i + 3) 7r(i + 3) 

In this case, i?2 = 1- 
Case 2" 

7r(i + 3) 7r(i + 3) 7r(i + 3) ^ ^ 



In this case. 



R2 



1-^x2 

7r(«+3) ^ 
^ 7r(i+3)-^l 



Case 3: 



7l(l + 2) , 

7r(i + 3) 7r(i + 3)^ 7r(i + 3) 
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In this case 



R2 



> 



1 ^(»+2) ^ 

1 ^(»+2) 
_ 7r(i+2) 



> 1 f r + la 

So again, in all three cases, -R2 > 1 — + l)'^- 

Next, note that i?i gives the ratio of the length of the interval of allowed values of 
X[i + 1] given X[i] = a and X[i + 2] = X2 to the length of the interval of allowed 
values values of X[2 + 1] given X[i] = a and X[2 + 2] = Xi. Similarly, R2 gives the 
ratio of the length of the interval of allowed values values of X[i + 3] given X[i + 4] = b 
and X[i + 2] = X2 to the length of the interval of allowed values values of X[i + 3] 
given X[2 + 4] = b and X[2 + 2] = xi. The product of these two ratios is at least 
(1 — (r+l)a)(l — ^|^^(r+l)a). Thus, it is possible to construct a map = (0i, 02, ^3) 
from the set of triples (X[i + 1], X[i+2], X[i + 3]) with X[i+2] = X2 to the set of triples 
with X[i + 2] = Xi which is linear in the first and third coordinates, and where the 
product of the slopes of the two linear parts are at most (1 — (r+l)a;)(l — ^||||y(r+l)a). 
This implies that 

>(l-(r + l)a)(l-^^^^(r + l)a) 

7r(z + 3) 

Assume first that 7r(z + 3) < 7r(i + 2). Then: 



P[X[i + 2] G [ra, (r + l)a] 


J' i,i+i,a,b\ 


P[X[t + 2] < a\ 





n(i+2)a 



P[X[t + 2] > a| J-i,i+4,a,fe] > ^[^[' + 2] e [ra, (r + l)a] \j^i,i+4,a,i 



r=l 



Using inequality relating P[X[i + 2] G [ra, (r + l)a]\J'i^i+4^a,b\ to P[X[i + 2] < 
ci^\j^i,i+i,a,b\, summing only the first half of the terms, and bounding each of those 
terms by their minimum results in the bound 

P[X[Z + 2] > a| J-,,+4,a,b] > (,+3) P[X[t + 2] < a| J-,,+4,a,6] 

l2aiT{i+2) J 

This proves the first inequality for 7r(i + 2) > 7r(i + 3). 



BIRTH AND DEATH CUTOFF 



11 



If, however, 7r(i + 3) > n{i + 2), P[X[i] e [ra, (r + l)a] | J'i,i+4,a,6] > {1 -ra)^P[X[i + 
2] < a\j-'i^i^4^a,b\, and the above calculation can be made a little more careful: 



P[X[t + 2] > a| J-i,i+4,a,f,] > J2 ^[^[' + 2] e [^«' + l)«]|-^M+4,a,6] 

> 5^ (1 - rafP[X[t + 2] < a| J-,,,+4,a,6] 



>P[X[« + 2] <a|j-,,,+4,a,6] 5^ ( 

r=l 

1 1 

3a 3 



> P[X[t + 2] < «|j;,.+4,aA 

So in fact P[X[i] < a] < 3j^. □ 

We note that, for a G 1], very similar calculations show: 



(9) P 



-^[^ + 1] < ^ I J^i,i+3,a,b 



1 . vr(z + 3) 
O — mm ' ^ 



V 7r(i + 2) 



We are finally ready to prove Lemma[5j By lemmas |8] and [TOj for B chosen uniformly 
from Stt, the distribution P[X[z + 2] < = a,X[i + 4] = b] satisfies the conditions 

of Lemma |6] with constant C = C{i). Next, fix 1 < i < j < n with i+ j < n + 2. Let 
Z and Q be chosen from B-,, chosen uniformly conditioned on Q[j] = bi, Z[j] = 62 
and Z[i] = Q[i] = a. We will apply the results of Lemma [6] to successive blocks of 
size two. That is, by Theorem^ we can view {Z[i]}l^- and {<5[^]}£=j as Markov 
chains, and so {Z[2i]}i<2e<j and {(5[^]}i<2^<i are also Markov chains. We will try to 
force these two chains to coallesce. Since they are Markov chains, if r is the (random) 
coallescence time, we have by Lemma |4] 

\\C{Z[t + 2£]) - C{Q[t + 2e])\\Tv < P[r > i] 

The minorization described in Lemma [6] implicitly describes a one-step coupling of 
the two chains, which is the coupling we use. The lemma bounds the probability that 
they coallesce in each step. Since t > £ only if coallescence fails at each of the first i 
steps, this proves the lemma. □. 

As an aside. Theorem 6.1.f of [o] gives a much better mixing estimate in the case 
that TT is uniform. Their estimate is based on finding all of the eigenvalues of the 
limiting Markov kernel as n goes to 00, which is not practical in the general case. 



Lemma 20 of 27 gives intermediate bounds in the case that vr is monotone but not 
uniform. 
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We will use the sub-diagonal representation of tri-diagonal matrices which is analo- 
gous to the representation in equation ^ to find a different improvement. Note that 
-R(^) > ^ if 7r(£ + 3) > 7r(£ + 2). An analogous bound for mixing of the sub-diagonal 
entries holds, with R{i) > ^ if 7i{i + 3) < 7r(£ + 2). This leads to the following 
corollary: 

Corollary 11. Let Z and Q be as in Lemma\^ Then 

ii£(z[z + 4£]) - cm + AmTv < (i - ^) 

We briefiy sketch the argument. We will look at the Markov chains {Z[i+4£]}o<4£<j_ 
and {Q[i + 4£]}o<4f<j-j. By symmetry the bound in Lemma [s] also applies to the pa- 
rameterization of by subdiagonal entries, with a term of the form 16 min(l, ^^j^^-^^ ) 

in place of the term 16min(l, '^^-^^) from Lemma jsj That is, the ratio of successive 
terms of vr is flipped. Thus, in each block of size 4, we can attempt a coallescing 
step using the one-step coupling described in Lemma |6] or the analogous version for 
sub-diagonal entries. At least one of these has a success probability of at least ^. 
The blocks are of length 4 rather than 2 to allow space to switch between the coupling 
in the super- and sub-diagonal representations, as the 1-step couplings are different. 
□ 

With this corollary, we can now prove our contraction estimate for starting distri- 
butions Xq, Yq differing at the single entry j satisfying k < j < n — k. Assume the 
block {j + 1, . . . , j + k) is being updated at time 0. The method is to choose entries 
inside the blocks in groups of size four, sequentially conditioned upon the endpoints 
of the large block. They are coupled as described in the proof of Lemma [5j As shown 



in Corollary 11, at each such step, Xi and Yi couple with probability at least 



4_ 

_ _ . 27 • 

Thus, the expected increase in distance is at most X^i^lp)* = 27, uniformly in k. 
Under this coupling, then, 

(10) E[d{X,,Y^MXo,Yo) = l,Xo[j] Yo[j]] < 1 — ^ - 54) 

n + 2w — 2 

for k<j<n — k + 1. By the same argument, for j < A; or j > n — /c + 1, we have 

(11) E[d{X,,Y,MXo,Yo) = l,Xo[j]j^Yo[j]] < 1 - —^—-{w - 54) 



Choosing k = w = 55, by inequalities (10) and (12) we have for any 1 < j <n 
(12) E[d{X,,Y,)\d{Xo,Yo) = l,Xo[j] + Fob]] < 1 ^ 



n + 108 

And so for n > 108, by Theorem [3] above, 

E[d{X,,Y,)] < n (l - ^] 
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By Lemma |4] and then Markov's inequality, 

\\C{Xt)-U\\Tv<P[d{Xt,Yt)>l] 



1 xt 



< 2n 1 - 

2n 

proving the upper bound. 

The lower bound follows from the usual 'coupon collector' problem. Since our 
walk is over a continuous space, the total variation distance to stationarity of the 
Markov chain at time t must be at least the probability that not all coordinates 
have been updated by time t. Since only k coordinates are updated at a time. 



the classical coupon-collector results in 13 tell us that at time T = \n{\ogn — c) 



sup^g2 l-^n (^5 ^) ~ ^ 1 ~ exp(— exp(c)) + o(l) as n goes to infinity. □ 

4. Cutoff Preliminaries 

We now return to the main problem, determining when sequences of random chains 
exhibit cutoff. The proof will rely on results in Section[3]and their analogues in Section 
6 of [9] . The approach makes heavy use of coupling in order to apply the results of [9] 
to much less symmetric objects. Both [o] and this paper heavily rely on Theorem^ 
above, which states that there is cutoff for a sequence of birth and death chains if 
and only if the product of the mixing time and spectral gap goes to infinity. So, the 
problem reduces to finding good bounds on the mixing time and spectral gap of these 
chains. 

We start by recalling a few results giving good estimates of the mixing time and 
spectral gap for birth and death chains. Let m be the median of tt; for vr symmetric 



this will be |. From 21 , the spectral gap (1 — \n{K)) for a BD chain with transition 
kernel K satisfies 

J^<(1-A„(/0)<| 

where B = ma.x{B^{m), B^lm)) and 



(13) 



/ " 1 \ 

B+im) = max > , , — > 7r(y) 

X 



(14) = max ( — —^7 ; — ^iy^7r(?/) 

\y=m+l ' ■< a ) J yyrc 



(15) 



B_(m) = max ( , , — | iT(y) 



The mixing time Tmixii) ^^^^ estimated by max^EolTm], En[Tm]), where Ei[Tj] 
is the expected hitting time of j for the birth and death chain started at position i. 
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Two results are needed. The first is that there exist universal functions Ci{6), €2(6) 
so that for quantile functions m{6) = mi{k : X]j<fc^(i) — 
(16) ^ 

Ci{5)max{Eo[Tm{5)],En[Trr,{l-5)]) < Tmixi^ < C2 (5) max(£'o [T^(5)] , E„ [T„(i_5)] ) 

for any | < 5 < 1. This is an immediate corollary of Theorem 1.1 of l24|. The 
inequalities [16] will be used to find the rough order of the mixing time bylooking 
instead at the hitting times. 

The second theorem deals with locating cutoff when it occurs. In flo], it is shown 
that if a sequence of chains with mixing profiles T„(e) and median functions m„((5) 
exhibits cutoff, then 

n^oo max(Eo[Tm„(5)],-E„[T^^(i_5)]) 

This will be used to estimate cutoff location in the few cases it is possible to prove 
the existence of cutoff. 

The other basic result is the following classical formula for expected hitting times 



(see e.g. 15 [23]): 

(19) E,[T^] = J2 

?;=0 

j—i—l ^ i+v 

Thus, estimating both the mixing and relaxation time reduces to estimating the 
weighted harmonic sums of the super-diagonal entries of the transition matrix found 



in equations (13) and (18). This requires some information about the marginal distri- 
bution of each entry, a theorem showing that these entries are not too dependant on 
each other, and finally some sort of invariance principle. The information about the 



marginal distributions is in Lemmas |8] and 10, with a more precise bound given by 
Theorem 6.1.f of f9] for specific chains. The information about mixing is in Lemma 
[sj and again Theorem 6.1.i of fo] gives more precise information for specific chains. 
The invariance principle will be Theorem 3.4 of j2], which requires some preparation 
to state. 

Recall that a stochastic process X„ is called strictly stationary if (Xj^, Xjj, . . . , Xj^) = 
(Xjj+s, Xjj+s, . . . ,Xjj.+s) in distribution for all s,k > 1 and alHi < ^2 < • • • < ik- 
Such a sequence is called jointly regularly varying with index a if, for all A; > 1, there 
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exists a random measure 9^ on the sphere S'^^^ = {(f i, . . . , Vk) \ Yl'i=i '^f — 1} that 
for all u G (0, oo) and measurable A C S^~^ , 

^^oo P[||X||2>X] 

This condition is hard to verify directly for the stationary sequence used in this paper. 
Theorem 2.1 of |3] gives a useful sufficient condition: 

Theorem 12 (Condition for Joint Regular Variation). Let Xn be a strictly stationary 
stochastic process. If there exists another process Yn such that 

• P[\Yo\>y]=y'^ fory>l 

• The following limit holds in finite- dimensional distribution: 

lim {(x"^X„)„g2;| |-^o| > x} = F„ 

x—^oo 

then Xn is jointly regularly varying with exponent 1. 

For a process Yn of this form, define On = to be the tail process of F„ (and also 

oiXn). 

Fix some sequence a„ satisfying lim„_^oo nP[Xi > a„] = 1. Say that a stochastic 
sequence X„ satisfies condition 1 if there exists a positive integer sequence r„ so that 
Tn — )■ c>o, n'^Tn 0, and for all -u > 0, 

lim lim sup P[ max \Xi\ > Ma„| |Xo| > ua„,] = 

TO^oo n^oo m<\i\<rn 

Define to be the a-algebra generated by {Xi}a<i<b- Say that a strictly stationary 
sequence Xn satisfies condition 2 if 

lim sup \P[E n F] - P[E]P[F]\ = 

This condition is known in the literature as strong mixing. Say that Xn satisfies 
condition 3 if, for all 5 > 0, 



lim lim sup P 



max I I — -1 I — E 

Q<k<n ' ^-^ \ an l^i <"an 



^-1 

HXi\<uan 

(In 



> 6 







By Proposition 3.7 of |2|, X„ satisfies condition 3 if it is p-mixing. That is, if 
lim sup{\E[YZ]-E[Y]E[Z]\ : Y e L^a^J, Z e L\(r^),\\Y\\2 = ||Z||2 = 1} = 

Theorem 3.4 of |2| states that 

Theorem 13 (Functional Limit Theorem for Mixing Sequences). Let Xn be a strictly 
stationary stochastic process which is jointly regularly varying with exponent 1. Fur- 
ther, assume that it satisfies conditions 1, 2 and 3. Let be its tail process, and 
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assume that G„ almost surely has no two values of opposite sign. Then the process 

[nt\ 



Vn{t) = J2 



Xi, 



[nt\E 



k=l 



MXi|<a„ 



Converges in the Mi topology on D[0, 1] (see ^df for a definition of this topology) to 
an a-stable Levy process with Levy triple (0, u, h) (see pp. 150 of J23] for a definition 
of a process with given Levy triple), where v is given by: 

lim z/^"^ = v 



and where, for x > 0, z/^") is given by: 



u^Yil\Y,\>i > X, sup \Yi\ < 1 



i>0 



u 



and b is given by 

I 

where fi is given by 
and, finally. 



lim 

«-s>0 



i>0 



j<-i 



l>i < —X, sup \Yi\ < 1 

i<-l 



x:u<\x\<l 



xfi{dx) 



2::M<|a;|<l 



fi{dx) = (pl(o,oo)(a;) + ql(^^ocfi{x))\x\ 
P[Xi > x] 



dx 



P 



lim 

x-^oo P[\Xi\ > X] 

P[Xi < -x] 
lim — , 

x^oo P\\Xi\ > X\ 



Almost all of the conditions for this theorem are close to holding for the stochastic 
process Xi = B[i] for B chosen uniformly from The one exception is stationarity; 
Xi must be a time-homogeneous Markov chain for the theorem to apply. If 7c{i) = 
„t-i i-a ^ 'look like' a stationary time-homogeneous chain for entries far from 



l-a' 



1 and n (see the limiting process defined in |9j for a precise example). Otherwise, 
the Markov chain will not be time-homogeneous, nevermind close to stationarity. To 
get around this problem, we will couple the process of interest, Xi = B[i], to the 
stationary limiting process Zi described in Section 6 of [9]. 

To review that paper, the process Zi is the weak limit of the stochastic processes 
Zj-"'' = where An is drawn uniformly from B,i i i-,. By Theorem 6.5 of 

[9], this is time-homogeneous with stationary distribution given by the cumulative 
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distribution function P[Zj < x 

sin(^ min(2,l— x)) 



X 



sm(f(l-x)) 



Theorem 



13 



sin(^), and transition kernel P[Zj^i < z\Zj 



will generally be applied to the process Zj, and a 
coupling will be found to compare Xj and Zj. The exception is that Theorem 13 may 
be applied directly to Xi when vr is of the form 7r(z) = Although Theorem 

13 will be applied to Zj more often than Xj in this paper, many of the bounds depend 
only on Lemmas [8] and 10 and Corollary 11 Since Zj is a limit of chains drawn from 
i\, it is easy to check by taking limits that all of these lemmas apply to Zj as 



well. 



In the remainder of this section, we will show that Theorem [13] applies to the 
sequence Zj, and then prove a comparison between Xj and Zj. First, the application 



to Zj-. 



Theorem 14 (Limiting Process for Harmonic Sum of Super-Diagonal Entries). Let 

Zj he the limiting process described in Section 6 of [9J. Then the renormalized process 



Vn{t) = 



k=l 



2n 



[nt\ 



cos 



1 ny 



TT 



y)dy 



converges in the Mi topology to a Levy process with Levy triple (0, z/, 0) where, for 
X > 1, 

uix, oo) = — 

X 

First, some initial calculations. By Theorem 6.5 of the sequence = 2n 
satisfies 



lim nP[Zi > ttr. 



lim n 

n—^oo 



2 cos' 



TT 



So this sequence a„ satisfies the requirements of Theorem 
in the statement of Theorem 



14 



comes from explicitly ca 



13 



cu 



The term ^ cos^(|?/)(i?/ 
ating E[ j^l|^^|<2„], based 



again on Theorem 6.5 of 9 . Note that asymptotically, this is approximated by 



E 



Zi 
2n 



^\Zi\<2n 



log(2?2) 



+ 



n 



Next, it is necessary to show that Zj follows conditions 1,2 and 3. For condition 2, 
set r„ = a/ti. Then by lemmas [s] and 10 

3 

lim limsupP[ sup \Zi\ > un \ Zq> un] < lim limsup(v^ — "W-) — 

m-^oo m<i<rn m->oo UU 

< lim — 

m~^oo Aum 

= 
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SO condition 1 is satisfied. Condition 2 is an immediate consequence of Lemma 11 
Condition 3 is, as mentioned, implied by p-mixing, wliicli is an immediate consequence 
of Tlieorem 6.5 of M. Next, we describe the regular variation of Zj. First, we have: 



Lemma 15 (Regular Variation). Fix tt, and let K he chosen uniformly from Aj^. 
Under the uniform distribution, the random variable ^^^^^^-^^ is regularly varying with 
exponent 1 for all < i < n — 1. 

Proof. By Lemma pj /(x) = xP[K{i,i + 1) < ^] is a monotone increasing function 
in X. By Lemma [lof f{x) is bounded above by some constant C. Thus, lim^^-s^oo f{x) 

exists; call this value /3. The next step is to show that lim^;- f 1 



K(i,i + 1) ■ 



1 for 



each a > 0. 

Fix a > and level of approximation 6 > 0. Then choose Xa^s so that for x > 5, 
\xP[K{i,i + l) < ^] — (3\ and \axP[K{i,i + l) < ^] — /3| are both less than min 
Then 

^ _ e axP[j^^^ >ax] B + e 



'M 1' 
3 ' 4^ 



/3 + e 



< 



xP\ 



K{i 



3e axP[ 
~J- xP\ 



K(i 



i+l) 



6 < 



axP[ 



K(i 



xP\ 



K{i 



< 



> X 



/3-e 



< 1 



> X 



i+l) 



3e 



i+l) 



> X 



<l + 6 



which proves the lemma. 



We use this to show joint regular variation: 

Lemma 16 (Joint Regular Variation of Limiting Super-Diagonal Chain). Zj is jointly 
regularly varying, with a.s. nonnegative tail process. 

Define Qq to have the distribution P[Qo > x] = ^ for x > 1, and define Qj = 
for j > 0. It is necessary to show only that this process satisfies the conditions of 
Lemma 12 By Lemmas [s] and 10, for j > 1, P[Zj > a\Zo > x] = 0{a~^) as x goes to 
infinity. By the union bound, then, for any a > and any fixed collection of positive 
integers J not containing 0, P[supjfzj Zj > xa\ZQ > x] = O(^). For j = 0, Lemma 
15 implies that lim^_j.oo P[Zo > ax\ZQ > x] = for a > 1. 

Note also that the process is almost surely nonnegative, and that 6'„ 
(nonrandom) tail process associated with Zj. □ 



l-n=0 



is the 



So all of the conditions of Theorem 13 are satisfied for the chain Zj. The only 



remaining work is to calculate the values of the Levy triple. 



BIRTH AND DEATH CUTOFF 19 

Define Qq to have the distribution P[Qq > x] = ^ for x > 1, and define Qj = for 
j > 0. Then 

u{x, oo) = hm u'^P[u QilQ,>i > X, sup Qi < 1] 

~i i<~l 
t>0 — 

= Mmu'^PlQo > u^^x] 

_ 1 

X 

for X > 1. This implies z/(— oo, x) = for x < 0. It is immediate that 6 = 0. This 
proves the theorem. □ 

Finally, we prove a comparison result. For a fixed distribution vr, define M{i) = 
16min(l, ^^^t}^ ). Then: 

Lemma 17 (Coupling to Stationarity). Let B he drawn uniformly from B^^, and let 
Xj = B[j]. Then let A be drawn uniformly from B,i i i\, and let Z~ = A[j]. It is 

possible to construct four chains y}^\ 1 < i < 4, such that y}^^ = Zj in distribution, 
and so that for indices 2 < j < n — 2, 



—M{2j)YiP < < 256M{2j)Y,f 
^M{2j + 1)Y,% < < 256M(2j + 1)Y,% 

otherwise. Note that the four chains Y- may not be independent. 



Proof: By Lemmas M and 10 



M(j)/3 ^ P[aX,<x|X,_2,X,+2] ^ 256Af(j)/3 
^ ' 256a - P[PYj < x\Yj^2,Yj+2] ~ ot 

(2) (2) 

To build Y^ step by step, begin by coupling Y^ to Xq so that they are both at 
the same quantile in their respective distributions. That is, choose Xq = x from its 
distribution, then set Y^ to the unique value a which satisfies P^^ < a] = P\Xq < 
x]. Next, couple ^2^+2 ^2j+2 given X2j and Y2j so that they both have the same 



quantile in their respective conditional distributions. By the inequalities on line 21 



they will satisfy the inequalities in the statement of this lemma. Do this until all 

(2) 

even entries have been filled in. Then construct the odd entries of Yj conditionally 
on the even entries, independently of Xj; their values are not relevant to this lemma. 

The remaining sequences y}^\y}^^ and y}^^ are constructed the same way. □ 
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Note that if finer control of Xt by chains with the same distribution as Yt is nec- 
essary, it is possible to improve the factors of 256 in the statement of this lemma by 
using 2k chains 1^^, . . . , 1^^^ and controlling entries Xkj+b with the chains Y^^~^, ^t^- 
This can quantitatively improve many later bounds, and lead to improved bounds on 
cutoff location, but is not useful for proving existence or non-existence of cutoff. See 
section 2.11 of |27| for details. 



By Lemma 17, for any nonnegative function h, 



n 

^ / /i(2j)M(2j) ^ /i(2j + l)M (2j + 1) 
V 256^2^ 256K 



(4) 
2j+l 



< 



^ / 256/i(2j)M(2j) 256/i(2j + l)M(2j + 1, , 

2-j y ^^(2) + ^^(4) I - 



2i 



X, 



This will allow us to approximate functionals of Xj such as (13) and (16) by substi- 
tuting yff for Xj. 



5. Cutoff Examples 
This section proves that random birth and death chains with 'IF' distribution (see 



17 for a description of this distribution and some basic calculations) and binomial 
distribution can exhibit cutoff. We begin by looking at the IF distributions. 



distributions n - 
for 1 < j < n 
constant, a > 1, 



a are symmetric on [2n — 1]. They satisfy 7r(j) = ca 



- n'^ and 7r(j) = c for n — n'^ 
< e < 1. Note that c x 



These 

-n+n'^+j — 1 



< j < n, where c is a normalizing 
^ for n sufficiently large, in that it is 



uniformly within a multiplicative factor of 2, which is all that matters for the following 



calculations. Thus, 7r(j) 



1 



n a' 



-n+n^+j i_ j^g^ ^ sequence of independent 



uniformly chosen birth and death chains with stationary distribution 7r„. Then, fix 
a function s : N — ?■ N. Theorem [18] below characterizes the existence of cutoff for 
^{I + Ks(n)) in terms of the growth rate of s. 

It is worth taking a moment to understand heuristically the need for this function 
s(n). It will turn out that, with probability about j^^^^^j, the expected hitting time of 

n from for the kernel |(/ + Kn) is comparable to the inverse of a single very small 

transition probability K[i,i + 1]. When this occurs, the distribution of this hitting 

time is essentially given by the CDF of a geometric random variable with mean 
1 



K[i,i+1] • 



Of course, this hitting time cannot possibly concentrate around its mean. 
For s{n) growing slowly, this domination by a single small transition probability value 
will happen for infinitely many values of n, preventing cutoff. The theorem below 
indicates that for s{n) growing quickly enough to avoid this particular obstacle, cutoff 
does happen. In that sense, this domination by a single small transition probability 
is the 'only' obstacle to cutoff. It is worth pointing out that this growth rate is very 
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rapid - even s{n) = 2" doesn't allow cutoff, though s{n) = 2" does. The following 
theorem generalizes the results of section 7 of (ol: 



Theorem 18 (Cutoff for IF Chains). Fix a > 1. For e < ^ in the above sequence 
of chains, there is cutoff with probability 1 if iog(s(n)) ^ There is no cutoff 
with probability 1 if Yin iog(^(n)) diverges, or if e> ^. In addition, for e < ^ and any 
a, fi > 0, 



lim P 

n—>oo 



I2V2 



TT 



nlog(n) 



< 



48 



First, assume < e < ^ and fix < 5 < 1. Also, for 1 < i < 4, define y/*"* as in 



the statement of Lemma |17[ Then by equation (18), 

5Z 'ZTn^KurVTri^'^^'^^ 



(22) El [Tn+Sn^ 

(23) 
(24) 

(25) 



-k(v)K\v, V + 1] 



V 

n—n 



f 'Kiv^KVu.V + 1] ^ 



^ ^ T^{v)K\v,v + 1] 



> 



Et 



a-1 /i:[t;,w + l] 



n — n 
2 



> 



1 



By the symmetry of vr, -Ei[T^.] = E2n-i\T2n-i-x\i so it is sufficient to look only at 
i?i[Ta.] for various values of x. Next, we look at the spectral gap. Again, it is enough 

to look at the quantity i?_ described in equation (13), since B_ = B^. From equation 
(13), we can view as a supremum over different values of x, and we will look at 
two regions for the values of x. 
Case 1: x < n — : In this case, 



E 

^y=x 



1 



x-1 



n—n'^ 



T^{y)K[y,y + 1] 



E'fe) = E 



1 



y=x 

n-1 



+ E 



K[y,y + l] 
1 



1 — a 



-1 



1 — a 



1 — a 



K[y,y + l] 



< 2 max 

x<.y<.n—n'^ 



K\y,y+l] 



^ n—l 

1-a-i 5Z 



K[y,y + l] 
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Rewriting, in case 1, all terms are at most 
2 



max 



1 - i<y< 



f 1 \ ^ 1 1 



y + i] 



Next, 

Case 2: x > n — : In this case, 



^iy)K[y,y + 1] 



y=0 



1 - a-"+"'+i N ^ 



n—n—y_ 



1 



n— 1 



< 



y=n—n' 



^K[y,y + l] 



Thus, putting together the two cases, 
1 - 



(26) 



< max max 



1 



n 



n-l 

E 



1 



i<y<n-n^ \K[y , y + 1] ^ ' ^^^^ K[y, y + 1] 



Note that by Lemma 17 and the explicit stationary distribution for Zj given shortly 

1 



after the proof of Theorem [13 
(27) 



max 



<n K[x, X + 1] 

Next, it is necessary to look at 

n— 1 



> Cn 



< 



24 



P 



E 



1 



> Cn 



K\y,y + 11 

Using Lemma 17, a union bound, and changing C by a universal constant independent 
of n, a, and e, this is at most 



2P 



E;4>p«'- 



v=0 ^2v 



which is bounded by 
(28) 



v=0 ^ 2v 



n ^ n 

^-^ V^^l V^^l ^2v ^ „e log(n) 



v=0 ^ 2v 



-P 



sup — rpr > log(r;,) 



2v 



But direct calculation shows that 
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and so, combining Markov's inequality with inequalities (27), (28) and (29), 

r 



(30) 



^ logH^ ^ 2 



Cn 



l-2e 



\og{n) 



In order to obtain quantitative bounds on which growth rates for s{n) result in 
cutoff, we require the following quantitative bound on the growth rate of — pj: 

Lemma 19 (Medium Deviations for Levy Sums). 

1 1 



P 



< 



Y 



V2: 



-n log(ra) — Cn 



O 



O 



c 



As shown in section 7.1 of |9], it is possible to find a sequence of iid random variables 
Q with finite exponential moments so that 

v=0 ^2v j=l J 

where Vj is a sequence of iid random variables distributed uniformly on [0, |], T = 
sup{i : (i < [^^J}, and where furthermore Q is stochastically dominated by an 
exponential variable with mean By section 4 of 



16 



P 



T ~1 

X:l<2Tlog(T)-CT =e(i)+0(i) 



Thus, 



71-1 

2 



y — 

v=0 ^ 2v 



< 



n log(n) — Cn 



< P 



+ P 



^ 1 ^V2n. .V2n, 
2^ — < log( 



j=i 1 



An 



Cn 



%/2n 

4:7V -I 



o 



0(^)+0(e--") 



for some 7 > 0. □. 



Applying inequalities (26), (27) and (30) to inequality (13), we find: 



(31) 



P 



[1 - K) < 



nAr 



O 



+ 



log(n) 



n 



l-2e 
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(32) 



Then, applying inequahty (22) and Lemma 19 to inequahty (17), we find: 

1 



-nlog(?2) — BnU 



12v^7r 

as An, En go to CX). Assume Y.n \og(s(n)) 



o 



1 

Br, 



o 



n log(s(n)) 



Un SO that Yl 
Then, by inequahties (31) and (32) 

P tJI - Xn) < 



converges. Then there exists some sequence 
converges and hm^-^ooi^n = oo. So, set Bn = An = ^"^'■"^ . 



o 



log(ra) 



12v^7r 

By Borel-Cantelh, the event that r„(l — A„,) < j^^^ — 1 occurs only finitely often, 
and so by Theorem [T] there is cutoff. 

Next, we must show that cutoff doesn't occur in the other cases. We start with the 
case iog(s(n)) ~ "^^^ ^^^^ ^^^P ^ lower bound on the expected hitting times: 
(33) 



Eo[T, 



< 



12 



0<2v<n-n^ 



(3) 



2v+l, 



+K+1) E 

n—n'^ <2v<n+5n^ 



1 



'^2v ~r J--) 



2v+l , 



and a corresponding upper bound on B_: 

1 



(34) 



S > 



6(1 



max 

n—n'^<2v<n 



(1) 



2v 



Fix some constant D to be determined later. For n — rf + D log(n) < 2j < n + drf — 
Dlog(n), let Aj be the event that > nlog(?T,). 



Then, conditioning on the event Aj, let At 



'2t+l 



Bt 



(1) ' 



We will couple 



these two chains to stationary versions, denoted by At and Bt. Our goal will be to 
better understand the behaviour of At, Bt by showing that they agree with At, Bt at 
all times more than distance 0(log(n)) from 2j, with high probability. To construct 
our coupling, begin by choosing A2j, B2j, A2j and B2j independently from their re- 
spective distributions. We will then iteratively construct {A2j+2i, ^2j+2e) conditioned 
on {A2j+2£-2, A2j+2i-2) accordiug to the coupling used in the proof of Lemma 5 As 
in the proof that that lemma, this coupling has the property that if A2j+2t = A2j+2i, 
then A2j+2i' = ^2j+2£' for all i' > i. We will use the same iterative construction for 
the three other pairs {A2j-2e, A2j-2e),{B2j+2e, B2j+2e) and {B2j-2i,B2j-2i)- 

Let Ci = inf{£ : A2j+2t = A^i,i > 0} and C2 = inf{£ : A2j-2i = A^e,i > 0} 
be the coupling times of At with At, and define the coupling times Cs, C4 fo'^ the chains 
Bt,Bt analogously. Finally, let Tcoupie = niaxi<j<4(^j) and let Sd be the event that 
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Tcoupie < -Dlog(n). Then, for B,C some large constants to be determined, 



25 



P[(l - a-')Eo[T^+sn^] < Cnlog{n)\A,] > P[{Y. [ Vai) + ] ^ ^00^'^°^^''^^ 

2v=l \^2v ^2v+l / 

2v=j+Dlog{n) \-^2v ^ 2v+l / 

2i;=j-Dlog(n) \ 2d 2i)+1 / 

j-Dlog(n) / 1 \ C 



< n log(n)| 



> P 



+ P 



+ P 



+ P 



2t;=l \-'^2i) -'^2t)+l 



2d=j+D log(n) \ 2t) ^ 2v+l . 

i+Dlog(n) / ^ ^ 



100 



2v=j-D\og{n) \-^2v ^ 2v+l 



(1)+^) 



jr-Dlog(n) / 

E 77(1) +77(1 

2?;=n-n* \ 2i) ^ 2v 



C 



(3) . < — nlogHjn^z^l^,- 

2t)+l 



We will now analyze these four terms, beginning with the last and most compli- 
cated. To simplify notation, let Ij = {x : n — n'^ < 2a; < n + 5rf, \j — 2x\ > D\og{n)} 
and let Hj = {x : n — n" < 2x < n + 5rf, |j — 2x\ < D log(n)}. 



P 



j-Dlog(n) ^ 



2v=n—n'^ 2v 



> 1 -P 
-P[£^|A-] 

> 1 -O 



1 c 

^L^ yiX) - 200 ^ 



1 



22 \ Dlog{n)^ 



where the two terms in the last line are bounded by Lemma [19] and Corollary [TT 
respectively. The second term is bounded the same way. 
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By inequality (30), the third term is 0(n~") for some a > depending on C, for C 
sufficiently large relative to D. Finally, the first term goes to 1 as S goes to infinity 
by Theorem 14 and Lemma [5j So, for B sufficiently large and C sufficiently large 
relative to and for some A^^o and all n > Nq, 



(35) 



P[Eo[T^]<Cnlogin)\A,] > 



99 

Too 



Next, we need a lower bound on and an upper bound on P[^jn^j]. By Lemma 



1 



n log{n) 



. By Lemma 



and Lemma 



10 



for |i — j| > 1. In the case |i — j| = 1, the same conclusion holds by Lemma |8j and 
the comment immediately following Lemma IlOl 



Let B be the event {max(Ei[T„+5„.], E2„_i[T„„5„.]) < Dn log(^^)}^{X;l<2^,<„ ^2^^ + 
Fal+i < Dn login)}. Then 

P[U„_,.+Dlog(n)<2i<„+5n^-Blog(„)A- H i3] > ^ P[Aj H i3] - ^ P[A^ H A, H B] 

1 \ „ / 1 



= n 

and so in particular, 

P[U„,-n»+D log(n)<2j<n+Sn^-Dlog{n)^j 



log(n) 



O 



\og{n) 



-Dlog{n)^j r\B] = O 



4 log(n) 

By Borel-Cantelli, this implies that the event L}n-n^+Diogin)<2j<n+5n^-Diog{n)^j H B 
happens infinitely often when i^g(l(^^y^ = co. By inequalities (13) and (16), rs(„)(l — 
As(n)) < -D on this event. Thus, lim inf „^oo T^n) ( 1 — ^s{n)) < oo almost surely. 

If e > |, arguments identical to those in M show that the spectral gap is il{n~'^^) 
with high probability as n goes to infinity, similarly, their bound on mixing times 
shows that Tmix = fi(n^'' log(n)) with high probability as n goes to infinity. Thus, 
there is no cutoff. 



Finally, we use equation (17) to estimate the location of cutoff, when it occurs. 



The lower bound in the statement of the theorem follows from inequality (32), and 



the upper bound follows from inequality (33). 

□ 

Next, we will look at the binomial distribution. Define a symmetric stationary 
distribution 7r„(a;) on [n] = {1, . . . ,n} by 7r„(x) = 2~'^("). Let Kn be a sequence of 
independent uniformly chosen birth and death chains with stationary distribution vr^. 
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Then, for function s : N — )■ N, the following theorem describes the existence of cutoff 
for |(/ + Ksi^n)) in terms of the growth rate of s. 

Theorem 20 (Cutoff for Binomial Distributions). The sequence of chains described 
exhibits cutoff if s{n) grows sufficiently rapidly, and doesn't exhibit cutoff if s{n) grows 
so slowly that: 

—^= = oo 

„>i Vs(n)log(s(n)2) 

The first step is to show that there is cutoff if s{n) grows sufficiently rapidly. Begin 
by looking at the mixing time. By classical arguments (see e.g. chapter 18 of [Is]), 
the expected hitting time of from and of from n for the underlying 

Metropolis chain is of order nlog(n) + 0{n). Thus, by Lemma 17 and then Lemma 
V. 



(36) lim P[max(Eo[T ],E„[r ]) < -^?2log(n)^ - Cnnlog{n)] 

n->oo L 2 J I 2 I ZOO 

as long as Cn goes to oo along with n. Since vr([0, "^^^ ]) is uniformly bounded away 
from I for all n, max{Eo[Tn+yfE], En[Tu-^]) is a good approximation of the hitting 

time for our random chain by inequality |l6[ ). Note that in order to get a quantitative 
bound on a growth rate of s{n) that would imply cutoff, it would be sufficient to get 
a quantitative bound such as Lemma [T9] on the rate of convergence here. 

Next, it is necessary to look at the spectral gap. Let $„(x) be the probability that 
flipping n fair coins will result in x or fewer heads, and let m be the median of vr 
(either | or depending on parity). Also let Zj = — ^ and Qj = — pj- be defined 

as in Lemma [171 Then 

(38) = max ^ -j—- — $„(x) 



m 



(39) < max ( ^ -^Zy<^^{x) + ^ —^Qy<^n{x) 

\2y=x \nj 2y=x \nJ 



Define 



m-l 



2y=x 

The next step is to bound Fn, and thus 5+. There are two cases. 
Case 1: (n — x)^ > ralogfn) 
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m—1 



x] 



2y=x 



m—1 ^ 
2y=x V 



In particular, sup^^(„_^)2>„iog(„) = 0(1) ^Z^^. Note that, by a union 

bound over y and the explicit stationary distribution for Zy given immediately after 



P[ sup Zy > n^-^] = O ( ^] 



Theorem 13 
(40) 

By direct computation and Lemma [TOt 

(41) E[Z,lz.<„i.5] <31og(n) 



Combining Markov's inequality with inequalities (|40j) and (41), then, 
(42) 

m—1 ^ 

P[ sup Fn{x) > Cn] <P[J2 -^Zylz^<ni.^ > Cn] + P[ sup Zy > n^-'] 



X : {n—x)^>nlog{n) 

(43) 

Case 2: {n — < nlog(n) 
In this case, 



O 



2y=x 

Clog(ra) 

n 



l<2y<Ti-l 



m—1 



X 



2y=x 



m—1 



0{Vn) 



'iy=n-^/n log(n) 



And SO sup^^(„_^)2<„i„g(„) Fn{x) = 0{^) YJ^^^^_^-^^Zy. Note that 



(44) 

and that 
(45) 



P 



sup Zy > a/ n log(r;,) log(n) 



O 



n—'^J n log(n)<2j;<n— 1 

E[Zjlz,<n^A = 0{\/n log(n) log(n)) 



1 



log(n) 
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Combining (44) and (45) with Markov's inequality, 

Fn{x) > n\og{nf-'] 



(46) 



P[ sup 

X : {n—x)^<nlog{n) 



o 



\og{n] 



0.5-e 



for any < e < |. Choose e = ^. Combine inequahties (42), (46) and (37) with 



inequahty (13) to find that hm„_ 



> n\og{n 



il.5l 



0. Combining this with 



the bound on r„ given by (36) and (16) shows that cutoff occurs for s{n) growing 
sufficiently quickly. Due to the weakness of inequality (36), we have no quantitative 
information about the growth rate required. 
Next, it is necessary to show that t 



lere is no cutoff if s{n) grows sufficiently slowly. 

^ > n log(n)^. 



18 



let A-i be the event that „r. . , 



Analogously to the proof of Theorem 

for I — ^/rl < J < I — 1 . By the same argument as found in the proof of Theorem 
between inequalities (34) and (35), 



P[max{Eo[T^], En[T,_^]) < Cnlog(n)Vj] > 



99 
100 



for some C fixed and large enough and all n > Nq. By a calculation almost identical 
to that immediately following inequality (35), 

p[A,] = n 
p[Aj n Ai] = o 

and so 



n log(n)2 
1 

log(n)^ 



Since r„(l — A„) < C on the set (UjAj) fl jmaxf-Ep [T ^+^ j , En \T, 

2 

by Borel-Cantelli, there is no cutoff with probability 1 if 

□ 



n— -Jn 



n\og{n)'^ 

])<Cnlog(n)2}, 
diverges. 



/s(n) log(s(n)) 



6. Non-Cutoff by Comparison to Metropolis Chains 

This section includes a theorem relating cutoff in random BD chains to cutoff 
in non-random chains. Let 7r„ be a sequence of stationary distributions on \n\ = 
{1,2, . . . ,2n} with |, ^ and | quantiles given by u„, m„ and Vn respectively. We 

will compare a sequence xj:"''' of random BD chains with stationary distribution 7r„, 

(n) 

and the 'Metropolis' chain with the same stationary distribution, which has 
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transition kernel M„ given by 

1 / 7i„ii + 1) 
M„|,.+ l] = ^mm(l.^lL_i 

M„|M-l]4m„>(l.-f^ 

This is the chain obtained by applying the Metropolis algorithm to the |-lazy simple 
random walk on the path. See [T] for a survey of the mathematical theory of this al- 
gorithm, which has been enormously influential in the application of MCMC methods 
to real problems. 

Next, define the functional in M* by 

v=l ^^''^"^ q=0 + ! "'^^ q=v 

Let BMet,n be the maximum value of the quantities 5+ and B_ defined by equation 
( 13 ) for the Metropolis chain. Then let x„ be a sequence satisfying one of the following 
inequalities for each n and some fixed a > 0: 

m„-l ^ 

J7\ ^1 — ? TT > aBMet,n 



Assume without loss of generality that it is the first of these two inequalities that is 
being satisfied from now on. Then define the functional in by 



9n{X) 



Assume without loss of generality that Eq\T^^ > i?„[T„„] for the Metropolis chain. 



and call this larger quantity TMet,n- By inequality (16), this is close to the actual 
mixing time of the Metropolis chain. Let (1 — XMet,n) be the spectral gap of the 
Metropolis chain. Define normalizing constants 

Vn-l V 

v=0 ' q=0 

rrin-l ^ 

y=x„ "^^^ y<x„ 



BIRTH AND DEATH CUTOFF 



31 



and let the normalized functionals be given by /„ = f3i^nfn and (?„ = P2,ngn- Then 

Theorem 21 (Weak Comparison to Metropolis). Assume fn ^ f (^nd Qn g in 
the M* topology, for some continuous functionals f and g and suitable normalizing 
sequences (3i^n and /32,n- Further assume that x„, m„, n — t>„, and {nin — Xn) all go 
to infinity as n does. Then if Yt does not exhibit cutoff, Xt does not either with 
probability 1. 



The proof follows the same basic outline as found in section 7 of [9]. First, it is 
necessary to bound the expected hitting time from above. 

Lemma 22 (Hitting Time for Metropolized Chains). Under the conditions of Theo- 
rem 



21\ for any sequence Cn — )■ oo, 

lim P[Eo[T,J > bUTMetACn + log(n))] = 

n— ^oo 

Defining ¥2^'' as in Lemma 17 

1 

2j+l=0 '^n{'2i)Y2i^i j^Q 



£;o[T.J < 256^ ^7r„(j) + 256 



2i=0 



(3) Y.^n{3) 



j=0 



> 256(y^ 

^2i=0 ' j=0 



1 



1 



E 



1 



(1) 



2i 



.(1) 



<2n 




^2j+l=0 ' j=0 



2j+l=0 



7Tn{2t + 1) \ V 



(3) 
2i+l 



E 



(3) 



.(3) 



-<2n 



2i+l 2i+l 




The first term is at most 256TMet,n, while by Theorem 14 and the assumptions, the 



second term multiplied by converges in distribution to f{V), with V a Levy 
variable. The same bounds apply to the third and fourth terms respectively. □ 

Next, it is necessary to examine the spectral gap. Let (1 — A„) be the gap of the 
random chain. Then, following a similar calculation. 



1^ - 4 5Z K[y,y + l]7r{y) ^ """^^^ 
- ^1 1 (log(n) + ^„) 

48 1 — AMet,n 

where, again, ipn converges in distribution to a Levy variable. Thus, if lim„_>.oo C„ 
00, 

(47) hm P[-^ < -(log(n) - C„)] = 

„^oo 1 - A„ 48(1 - AMet,n) 
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Putting together Lemma 22 and inequality (47), and setting Cn = log(n)°'^, gives 



(4J 



lim P[r„(l - A„) > 



24576 



a 



TMet,n{'^ — ^Met,n)] — 



Since there exists some constant C so that T]\jet 
by the Borel-Cantelh Lemma and 
infinitely often. □ 



:i-A 



Metji) 



48 



< C infinitely often, 



the random chain satisfies r„(l — A„) < 



24576 



C 



Because of the slightly exotic metrics used in its definition, it might not be clear 

lustrate its applications, the following is 



that Theorem 21 can actually be used. To i 



a short second proof of the cutoff result in 
all 1 < i < n. Then set a;„ = t- It is clear t^ 



9j using this theorem. Set 7r„ 
lat 



for 



f-f- 



n 



is within a constant factor of the inverse spectral gap of the Metropolis chain, so the 
choice of Xn is allowed. We also note that m„ = ^ + 0(1), m„ = f + 0(1), and 
y„ = ^ + 0(1). Next, note that 



UX) = ±-xC-) + o( 

n 

^-^ n n 



lAI 



n 



n 



The last step is to show that 



/n(A) ^ / vX{v)dv 
Jo 

1 

gn{X) X{v)dv 

in the M* topology. It is clear that this convergence holds when applied to all 
continuous functions A G D[0, 1], and that these functions are dense in -D[0, 1] under 
the Ml topology, so it is sufficient to show that the hmiting functions are continuous 
in M*. It is easiest to look at the limiting functional (7, though the proof for / is 
essentially identical. Let A„ — )■ A in the Mi topology. By Theorem 2.4.1 of |26j and 
an application of the triangle inequality, for all 5 > 0, there exists some N such that 
for all n > N , 



inf X{u) 

v—5<.u<.v+6 



5 < A„(f ) < sup A(m) + 6 

v—5<u<v+5 
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Lets look at only the upper bound, since the lower is identical. 

3 3 1 

4 / 4 12 



(A„(f ) — X{v))dv < / 5dv + / ( sup X{u) — \{v))dv 

Jo Jo v-5<u<v+5 

The first term clearly goes to as 5 goes to 0, and the second goes to by the 
dominated convergence theorem. Thus, integration is continuous, and the theorem 
can be applied in this case. 

We conjecture that divergence of Y] rrzi — r for the Metropolis chain also 

implies lack of cutoff for the random chain, but have not been able to prove it. The 
simple conjecture just comes from assuming that unusually large values of don't 
tend to occur especially often with unusually large values of r„. In particular, this 
conjecture would hold if the sums defining the expected hitting times and spectral 
gap were all independent. Note also that this conjecture is based only on the spectral 



gap of the original chain, and as shown by Theorem 18 , it cannot be sharp 
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